Kirsten Pilatti, CEO of Breast Cancer Network Australia (BCNA). The organisation’s goal is to provide the best care and support for Australians suffering from breast cancer.
Linkedin: https://www.linkedin.com/in/kirsten-pilatti-6139219/
Recommendation: Patients found to have estrogen or progesterone negative cancer should be recommended to seek radical treatments like chemotherapy or surgery to preserve life instead of continuing hormone therapy as they are more likely to have both types of hormone negativity and experience more invasive, aggressive cancer.
Summary:
summary(breast_cancer)
## Age Race Marital.Status T.Stage
## Min. :30.00 Length:4024 Length:4024 Length:4024
## 1st Qu.:47.00 Class :character Class :character Class :character
## Median :54.00 Mode :character Mode :character Mode :character
## Mean :53.97
## 3rd Qu.:61.00
## Max. :69.00
## N.Stage X6th.Stage differentiate Grade
## Length:4024 Length:4024 Length:4024 Length:4024
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## A.Stage Tumor.Size Estrogen.Status Progesterone.Status
## Length:4024 Min. : 1.00 Length:4024 Length:4024
## Class :character 1st Qu.: 16.00 Class :character Class :character
## Mode :character Median : 25.00 Mode :character Mode :character
## Mean : 30.47
## 3rd Qu.: 38.00
## Max. :140.00
## Regional.Node.Examined Reginol.Node.Positive Survival.Months
## Min. : 1.00 Min. : 1.000 Min. : 1.0
## 1st Qu.: 9.00 1st Qu.: 1.000 1st Qu.: 56.0
## Median :14.00 Median : 2.000 Median : 73.0
## Mean :14.36 Mean : 4.158 Mean : 71.3
## 3rd Qu.:19.00 3rd Qu.: 5.000 3rd Qu.: 90.0
## Max. :61.00 Max. :46.000 Max. :107.0
## Status
## Length:4024
## Class :character
## Mode :character
##
##
##
The data came from the 2017 update of the SEER program involving female patients suffering from breast cancer of a specific type.
More information can be found here:
There are studies showing that hormone negative cancer cells are harder to treat and more aggressive as those which are positive can be combat with hormone therapy (Double Negative Breast Cancer, n.d.). To investigate this relation, two sets of comparative box plots were drawn.
tumor_double_negative_plot <-ggplot(N_N, aes(x = factor(Estrogen.Status), y = Tumor.Size)) +
geom_boxplot() +
labs(
title = "Comparative Boxplot of Tumor Size by Estrogen and Progesterone Status",
x = "Negative",
y = "Tumor Size"
)
tumor_double_positive_plot <- ggplot(P_P, aes(x = factor(Progesterone.Status), y = Tumor.Size)) +
geom_boxplot() +
labs(
title = "Comparative Boxplot of Tumor Size by Estrogen and Progesterone Status",
x = "Positive",
y = "Tumor Size"
)
subplot(tumor_double_positive_plot, tumor_double_negative_plot, nrows = 1)
From these two comparative box plots, we can see that hormone negativity is associated with slightly larger tumor size.
Two sample T-test to check the relation between double negative and double positive cancer on tumor size at 5% confidence level.
data_NN <- N_N %>%
select(Tumor.Size)
data_PP <- P_P %>%
select(Tumor.Size)
t.test(data_NN, data_PP, var.equal = T)
##
## Two Sample t-test
##
## data: data_NN and data_PP
## t = 3.8923, df = 3539, p-value = 0.0001011
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.691723 8.155919
## sample estimates:
## mean of x mean of y
## 35.17769 29.75386
The p_value 0.0001 < 0.05 means that the result is statistically significant and we conclude that there is evidence for the statement that patients with double negative cancer have larger tumors. This may warrant more radical treatments like surgery sooner to prevent having to resort to mastectomy (Surgery for Breast Cancer | Breast Cancer Treatment, n.d.).
If hormone negativity is associated with more aggressive tumors, they would lead to a worse prognosis in patients, as reported by some studies (Double Negative Breast Cancer, n.d.). Two sets of comparative box plots using deceased patients for true survival months were drawn to check this relation.
tumor_double_negative_plot <-ggplot(N_N_Dead, aes(x = factor(Estrogen.Status), y = Survival.Months)) +
geom_boxplot() +
labs(
title = "Comparative Boxplot of Survival Months by Estrogen and Progesterone Status",
x = "Negative",
y = "Survival Months"
)
tumor_double_positive_plot <- ggplot(P_P_Dead, aes(x = factor(Progesterone.Status), y = Survival.Months)) +
geom_boxplot() +
labs(
title = "Comparative Boxplot of Survival Months by Estrogen and Progesterone Status",
x = "Positive",
y = "Survival Months"
)
subplot(tumor_double_positive_plot, tumor_double_negative_plot, nrows = 1)
In these box plots, deceased patients are selected to ensure that their survival months are true survival months (time from diagnosis to death). Comparing these two box plots, it is clear that those who have hormone negative cancer of one of both types have a radically reduced survival months.
Two sample T-test to check the relation between double negative and double positive cancer on survival months at 5% confidence level.
data_NN_Survival <- N_N_Dead %>%
select(Survival.Months)
data_PP_Survival <- P_P_Dead %>%
select(Survival.Months)
t.test(data_NN, data_PP, var.equal = F)
##
## Welch Two Sample t-test
##
## data: data_NN and data_PP
## t = 3.3822, df = 267.14, p-value = 0.0008266
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.266476 8.581166
## sample estimates:
## mean of x mean of y
## 35.17769 29.75386
The p_value 0.000784 < 0.05 means that the result is statistically significant and we conclude that there is evidence for the statement that patients with double negative cancer have a lower survival months. Once again, this may warrant radical treatments in order to preserve life (Surgery for Breast Cancer | Breast Cancer Treatment, n.d.).
As both Estrogen and Progesterone are important chemicals influencing the female reproductive functions, there might be a dependency between them (University, n.d.). A mosaic plot is drawn to check this relation.
Hormone = matrix(c(nrow(N_N),nrow(P_N),nrow(N_P),nrow(P_P)), nrow = 2, ncol = 2, byrow = TRUE, dimnames = list(c("Estrogen Negative", "Estrogen Positive"), c("Progesterone Negative", "Progesterone Positive")))
mosaicplot(Hormone)
print(Hormone)
## Progesterone Negative Progesterone Positive
## Estrogen Negative 242 456
## Estrogen Positive 27 3299
We can see that there is disproportionately more in the double positive group, suggesting dependency.
Chi-square test to check relation between two types of hormone negativity at 5% confidence level
chisq.test(Hormone)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: Hormone
## X-squared = 1054.8, df = 1, p-value < 2.2e-16
As the p-value 2.2e-16 < 0.05, we can conclude that there is dependency. This would imply that having one type of hormone negativity leads to an increased chance of having the other, making hormone therapy not possible and so further the need for other treatment methods like surgery (Surgery for Breast Cancer | Breast Cancer Treatment, n.d.).
teng, jing. (2019). SEER Breast Cancer Data. Ieee-Dataport.org. https://ieee-dataport.org/open-access/seer-breast-cancer-data
Breast Cancer. (n.d.). Www.kaggle.com. https://www.kaggle.com/datasets/reihanenamdari/breast-cancer
Cancer.net. (2019, January 8). Stages of Cancer. Cancer.net. https://www.cancer.net/navigating-cancer-care/diagnosing-cancer/stages-cancer
Double Negative Breast Cancer. (n.d.). Vial. Retrieved November 1, 2023, from https://vial.com/glossary/double-negative-breast-cancer/?https://vial.com/glossary/double-negative-breast-cancer/?utm_source=organic
University, T. A. N. (n.d.). Oestrogen and Progesterone. Bluepages.anu.edu.au. https://bluepages.anu.edu.au/medical-treatments/oestrogen/#:~:text=Oestrogen%20(also%20called%20
The client Kirsten Pilatti, representing BCNA, was chosen as this report would contribute to advising the cancer patients.
The choice to take or not to take radical treatments with severe possible side effects is a difficult decision that many cancer patients will have to face. There is no doubt that this is a question that BCNA would often face. With the idea of life over limbs at heart, this report content is centered around predicting the prognosis of cancer to justify advising radical treatments
H - H0: Progesterone and Estrogen negativity makes no difference to a patient’s tumor size H1: Progesterone and Estrogen negativity increase a patient’s tumor size
A - Independence, normality, equal spread
Independence: We assumed independence due to the large sample making any possible dependency insignificant
Normality:
Eye test: The comparative box plots shows a significant number of outliers, possibly making this model invalid.
Shapiro-Wilk tests: Both p-values < 0.05, showing that the two data sets is not normal and so this model may be invalid
data_NN_numerical <- as.numeric(unlist(data_NN))
data_PP_numerical <- as.numeric(unlist(data_PP))
shapiro.test(data_NN_numerical)
##
## Shapiro-Wilk normality test
##
## data: data_NN_numerical
## W = 0.87077, p-value = 1.926e-13
shapiro.test(data_PP_numerical)
##
## Shapiro-Wilk normality test
##
## data: data_PP_numerical
## W = 0.83287, p-value < 2.2e-16
Equal spread:
Eye test: The comparative box plots shows roughly equal spread
Levene tests: As the p-value < 0.05, the two dataset can be considered having unequal spread, making some analysis possibly invalid.
var.test(data_NN_numerical, data_PP_numerical)
##
## F test to compare two variances
##
## data: data_NN_numerical and data_PP_numerical
## F = 1.3855, num df = 241, denom df = 3298, p-value = 0.0002636
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 1.159769 1.680107
## sample estimates:
## ratio of variances
## 1.385463
H - H0: Progesterone and Estrogen negativity makes no difference to a patient’s survival months. H1: Progesterone and Estrogen negativity reduce a patient’s survival months.
A - Independence, normality, equal spread.
Independence: We assumed independence due to the large sample making any possible dependency insignificant.
Normality:
Eye test: The comparative box plots shows small numbers of outliers.
Shapiro - Wilks tests: Both p-values < 0.05, showing that the two data sets is not normal and so this model may be invalid.
data_NN_Survival_numerical <- as.numeric(unlist(data_NN_Survival))
data_PP_Survival_numerical <- as.numeric(unlist(data_PP_Survival))
shapiro.test(data_NN_Survival_numerical)
##
## Shapiro-Wilk normality test
##
## data: data_NN_Survival_numerical
## W = 0.88049, p-value = 1.523e-07
shapiro.test(data_PP_Survival_numerical)
##
## Shapiro-Wilk normality test
##
## data: data_PP_Survival_numerical
## W = 0.98538, p-value = 0.0004081
Equal spread:
Eye test: The comparative box plots shows slightly unequal spread.
Levene tests: As the p-value < 0.05, analysis using the Welch’s T-test is more appropriate as the spread is unequal.
var.test(data_NN_Survival_numerical, data_PP_Survival_numerical)
##
## F test to compare two variances
##
## data: data_NN_Survival_numerical and data_PP_Survival_numerical
## F = 0.67178, num df = 101, denom df = 405, p-value = 0.0167
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.5001119 0.9290379
## sample estimates:
## ratio of variances
## 0.6717814
H - H0: Progesterone and Estrogen negativity is independent H1: Progesterone and Estrogen negativity is not independent
A - Cochran’s Rule (satisfied, all expected values are > 5)
There is little data involving which treatment the participants have received. This will act as a confounding variable.
There is little data on the real survival months of participants, time from the onset of cancer to death might be a better indicator.
There are many outliers in the double positive group, making the correlation potentially invalid or weaker than expected.
The datasets are not normal, making some conclusions potential invalid, further research required.